Skip to content

Conversation

@ajrasane
Copy link
Contributor

What does this PR do?

Type of change:
New Feature

Overview:

  • Created an abstract parent class for ONNXQuantExporter
  • Created child classes for individual precisions
  • Implemented the INT4QuantExporter
  • Removed quantize_weights_to_int4
  • Added a method to quantize weights of the ONNX model to low precision

Testing

python torch_quant_to_onnx.py --quantize_mode=int4_awq \
	--onnx_save_path=<onnx_path> \

Before your PR is "Ready for review"

  • Make sure you read and follow Contributor guidelines and your commits are signed.
  • Is this change backward compatible?: Yes
  • Did you write any new necessary tests?: No
  • Did you add or update any necessary documentation?: Yes
  • Did you update Changelog?: No

@ajrasane ajrasane self-assigned this Nov 18, 2025
@ajrasane ajrasane requested review from a team as code owners November 18, 2025 18:14
@ajrasane ajrasane requested a review from i-riyad November 18, 2025 18:14
@codecov
Copy link

codecov bot commented Nov 18, 2025

Codecov Report

❌ Patch coverage is 16.66667% with 15 lines in your changes missing coverage. Please review.
✅ Project coverage is 74.61%. Comparing base (7a36ccc) to head (9a45ddb).
⚠️ Report is 6 commits behind head on main.

Files with missing lines Patch % Lines
modelopt/torch/_deploy/utils/torch_onnx.py 11.76% 15 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #575      +/-   ##
==========================================
+ Coverage   74.57%   74.61%   +0.04%     
==========================================
  Files         183      183              
  Lines       18412    18546     +134     
==========================================
+ Hits        13730    13839     +109     
- Misses       4682     4707      +25     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@gcunhase
Copy link
Contributor

If this PR is just for INT4, and NVFP4 and MXFP8 are WIP, can you please update the title accordingly? Thanks!

@ajrasane ajrasane changed the title [OMNIML-2244] Create the ONNX quantization exporter [OMNIML-2244] Implement the ONNX quantization exporter for INT4 Nov 19, 2025
@ajrasane ajrasane requested a review from galagam November 19, 2025 11:03
@ajrasane ajrasane force-pushed the ajrasane/mixed_precision branch from e829998 to 83028fa Compare November 20, 2025 20:34
@ajrasane ajrasane force-pushed the ajrasane/mixed_precision branch from 83028fa to 530de9b Compare November 24, 2025 23:18
Signed-off-by: ajrasane <[email protected]>
@ajrasane ajrasane force-pushed the ajrasane/mixed_precision branch from 530de9b to a4c3e31 Compare November 24, 2025 23:24
@ajrasane ajrasane force-pushed the ajrasane/mixed_precision branch from f4e6f50 to 88567b1 Compare November 26, 2025 03:18
Copy link
Contributor

@galagam galagam left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good after the last commit. Approved.

next_node = cast_child_nodes[0]

# Store transpose permutation if present
if next_node.op_type == "Transpose":
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: elif

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We will need to call this after the cast node is processed

@ajrasane ajrasane enabled auto-merge (squash) November 26, 2025 19:44
@ajrasane ajrasane merged commit 0a4f0a8 into main Nov 26, 2025
27 checks passed
@ajrasane ajrasane deleted the ajrasane/mixed_precision branch November 26, 2025 20:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants